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ABSTRACT 


Classical  significance  tests  depend  upon  a  choice  of  significance  level 
and  also  seem  overready  to  reject  the  null  hypothesis  when  the  sample  size  is 
large.  A  plausible  alternative  to  significance  testing  has  been  suggested  by 
Schwarz  (1978)  in  the  context  of  model  discrimination.  A  simpler  and  more 
general  formulation  is  discussed  here;  this  leads  to  more  precise  approxima¬ 
tions.  For  a  single  parameter  and  when  the  sample  size  n  is  large  we  recom¬ 
mend  viewing  the  data  as  supporting  a  simple  null  hypothesis  versus  a 
completely  composite  alternative  whenever  the  maximum  likelihood  estimate 
lies  within  an  adjustment  to  / log  n  approximate  standard  deviations  of  the 
null  hypothesis.  This  criterion  indeed  appears  to  provide  an  attractive  rule 
of  thumb  for  all  sample  sizes:  It  removes  the  need  for  tables  of  significance 
levels  and  becomes  less  keen  to  reject  the  null  hypothesis  for  large  sample 
sizes.  The  ideas  are  extended  to  provide  alternatives  to  multivariate  like¬ 
lihood  ratio  tests,  and  to  the  chi-squared  goodness  of  fit  test. 
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SIGNIFICANCE  AND  EXPLANATION 


"X. 


Significance  tests  are  commonly  used  in  many  application  areas  as  attempts 
to  formally  confirm  or  refute  specific  conclusions.  For  example,  in  the  social 
sciences  (e.g.  psychology,  sociology,  and  econometrics)  there  is  often  much 
more  emphasis  on  data-fitting  and  seeking  "significant"  results  than  on  develop¬ 
ing  proper  mathematical  models  which  relate  in  an  inductively  sensible  way  to 
the  real-life  problem. v  However,  significance  tests  do  not  possess  too  much 
formal  justification  in/ the  literature  for  making  specific  decisions  as  to 
whether__A-fTaf?ic ul a r  hypothesis  is  true. 

^  In  the  present  paper  a  new  formulation  is  used  to  demonstrate  that  signifi¬ 
cance  tests  tend  to  be  much  too  ready  to  reject  the  null  hypothesis  for  large 
sample  sizes.  It  is  recommended  that  the  usual  percentage  points  should  be 
replaced  by  quantities  depending  in  a  particular  way  upon  sample  size,  but  not 
upon  a  choice  of  significance  level.  The  phenomena  discussed  would  appear  to 
be  particularly  relevant  to  the  area  of  scientific  reporting.  For  example, 
many  results  in  applied  journals  which  might  have  been  viewed  as  significant," 
because  they  yield  a  low  p-value ,  may  in  fact  serve  to  detract  from  the  very 
scientific  theory  which  they  claim  to  substantiate. 

For  large  sample  sizes,  the  techniques  proposed  in  this  paper  permit  a 
larger  range  of  viable  null  hypotheses  than  experienced  under  fixed-size 
significance  testing.  It  should  therefore  be  easier  to  use  them  to  find  a 
data-credible  model  which  is  also  reasonable  in  real-life  terms. 
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WHY  DO  WE  NEED  SIGNIFICANCE  LEVELS? 


Tom  Leonard* 

1.  Introduction 

Consider  firstly  the  situation  where  a  single  parameter  0  possesses  a 

likelihood  function  8.  (6  [ x)  depending  upon  a  vector  x  =  (x,  ,  .  .  .  ,  x  )  of 

'In 

n  observations,  and  where  0  assumes  values  in  a  continuous  parameter  space  (-). 
We  assume  that  it  is  desired  to  test  the  simple  null  hypothesis  H0  :  0  =  0Q 

against  the  conposite  alternative  :  0  ^  0Q  or  to  find  some  viable 

alternative  to  this  procedure . 

There  are  two  main  questions  which  the  statistician  may  wish  to  ask 
himself  in  this  situation,  namely 

(A)  Should  the  information  in  the  data  have  a  positive  or  negative  influence 
upon  his  judgement  about  the  truth  of  HQ? 

(B)  Is  the  information  in  the  data  in  sufficient  conflict  with  HQ  to  suggest 
that  he  should  take  the  step  of  rejecting  HQ? 

It  appears  to  us  that  question  (B)  can  only  be  adequately  answered  within 
a  decision-making  framework  where  the  statistician  assigns  utilities  to  0, 
under  HQ  and  ;  see  for  example  Dickey  (1968,  1975)  and  one  of  the  methods 
discussed  in  section  4.  We  would  indeed  go  so  far  as  to  view  it  as  unfair  to 
expect  classical  significance  tests  to  formally  cope  with  (B) .  It  should  be 
remembered  that  significance  tests  were  originally  introduced  as  inductive 
tools  for  the  working  statistician  and  were  viewed  as  alternatives  to  the 
decision-theoretic  approach  of  Wald.  For  example,  the  sampling  probabilities 
obtained  could  be  interpreted  in  the  context  of  the  real-life  problem.  Sig¬ 
nificance  tests  possess  l-'ttle  sensible  justification  in  the  literature  for 
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formally  coping  with  decisions  regarding  actual  acceptance  or  rejection,  and 
should  therefore  not  be  expected  to  provide  an  adequate  formal  answer  to  the 
decision-making  problem.  In  section  4  it  will  indeed  be  noted  that  they  pro¬ 
vide  very  different  answers  to  those  suggested  by  a  sensibly- formulated 
decision-theoretic  approach. 

Question  (A)  seems  to  be  of  frequent  importance  to  statisticians  thinking 
inductively  about  their  data  sets;  this  is  probably  one  question  which  many 
statisticians  would  like  to  answer  when  they  employ  significance  tests.  The 
main  claim  of  this  paper  is  however  that  classical  significance  testing  should 
not  be  used  to  formally  answer  (A)  or  (B)  unless  the  size  of  the  test  is 
permitted  to  depend  i^on  the  size  n  of  the  sample  in  a  particular  way.  This 
might  lead  the  reader  to  question  the  usefulness  of  significance  tests.  In  our 
opinion  the  latter  assume  an  over-prominent  position  in  statistical  methodology. 
In  the  long  term,  substantial  changes  to  the  teaching  of  significance  tests, 
and  to  their  widespread  applications  e.g.  in  the  social  sciences,  might  perhaps 
be  beneficial. 

A  simple  criterion  is  now  introduced  for  answering  (A).  Suppose  that  the 
statistician  possesses  a  prior  density  tt  (0 )  for  9  and  denote  the  correspond¬ 
ing  posterior  density  by  n (9 | x ) .  Then  consider  the  definition 
Definition :  The  data  support  HQ  with  respect  to  it  if 

it <e0 1 x)  >  n(eQ)  (l.i) 

This  provides  a  natural  criterion  for  answering  (i),  as  long  as  the  prior 
can  be  specified;  note  that  if  the  data  support  HQ  then  the  probability  of 
9  lying  within  a  small  neighbourhood  of  9Q  will  be  increased  by  the  informa¬ 
tion  provided  by  the  data. 

One  of  the  few  criticisms  that  can  be  levelled  at  (1.1)  by  supporters  of 
any  philosophy  of  statistics  is  that  it  is  dependent  upon  the  choice  of  prior 


*  for  0’  We  wil1  however  demonstrate  that  as  n  increases  the  effect  of  the 
prior  decreases,  leading  to  a  sensible  approximate  procedure  which  is  completely 
free  from  the  choice  of  prior  distribution.  For  small  sample  sizes  we  feel 
that  our  asymptotic  procedure  will  still  provide  a  useful  rule  of  thumb  in 

situations  where  the  prior  information  about  6  is  fuzzy  and  difficult  to 
specify. 
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2 .  Asymptotic  Results 

As  n  gets  large  the  posterior  distribution  of  0  will  be  (typically) 
asymptotically  normal  with  mean  equal  to  the  maximum  likelihood  estimate  0  of 
0  and  variance  v/n  where 


nv 


3  log  M0|x) 


30 


(2.1) 


0  =  0 


Precise  regularity  conditions  for  this  approximation,  based  upon  the  idea 
of  supercontinuous  likehoods,  are  described  by  De  Groot  (1970,  p.  210).  Sub¬ 
stituting  the  corresponding  approximation  for  n(0Q|x)  in  (1.1)  tells  us  that 


as  n  -*•  °°  the  data  support  HQ  whenever 


(2n)M 


exp  {-  j  nv  1 (0Q  -  0)2}  >  *(0q) 


(2.2) 


A  slight  rearrangement  leads  to 


l«-0ol  .i 

- _  ^  ^ —  <  {log  n  -  log  (2itv)  -  2  log  ir(0Q)}2 


n  ‘v 


-  {log  n  -  log  (2irv))^ 


(2.3) 


(n  -*■  ®) 

The  data  will  therefore  support  HQ  with  respect  to  any  prior  distribu¬ 
tion  it  if  n  is  large  enough  and  0  lies  within  an  adjustment  to  /  log  n 
approximate  standard  deviations  /(v/n)  of  the  hypothesised  value  0Q.  Whilst 
the  adjustment  term  log  (2nv)  will  often  tend  to  a  constant  as  n  -*■  00  it 
should  usually  be  included  as  it  may  be  quite  sizeable. 

The  condition  in  (2.3)  provides  an  interesting  alternative  to  classical 
significance  tests  when  the  sample  size  is  large.  For  example,  if  n  *  1000 
we  have  /  log  n  =  2.63,  whilst  for  n  =  10,000,  /  log  n  increases  to  3.05. 
Therefore,  for  sample  sizes  up  to  10,000,  and  in  special  cases  where  log  (2irv) 
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is  negligible,  this  procedure  effectively  fixes  the  significance  level  of  the 
standard  test  (based  upon  the  normal  approximation  described  above)  to  a  value 
depending  upon  n  but  less  than  the  0.005%  level.  For  sample  sizes  higher 
than  10,000  we  are  really  saying  that  none  of  the  standard  significance 
levels  are  appropriate  to  this  situation.  Indeed,  standard  test  procedures 
will  frequently  reject  HQ  at  any  sensible  significance  level  in  situations 
where  our  procedure  suggests  that  the  data  support  HQ.  Therefore,  as  well  as 
showing  that  standard  test  procedures  do  not  sensibly  answer  question  (A) 
according  to  our  own  criterion,  we  have  demonstrated  in  a  simple  and  direct  way 
that  they  should  not  really  be  expected  to  adequately  answer  (B) . 

When  n  is  small,  the  condition  in  (2.3)  becomes  less  adequate  in  a 
formal  sense.  We  however  feel  that  it  still  provides  a  useful  "rule  of  thumb" 
for  the  inductive  statistician  in  situations  where  his  prior  m  is  difficult 
to  specify.  It  certainly  seems  no  worse  than  the  standard  convention  which 
requires  him  to  investigate,  for  any  sample  size,  whether  0  lies  within  two 
standard  deviations  of  0Q.  The  criterion  enables  him  to  make  a  judgement 
which,  whilst  not  completely  precise  for  any  particular  sample  size,  at  least 
varies  in  a  sensible  way  for  different  sample  sizes. 

The  adequacy  of  the  approximation  in  (2.3)  is  open  to  some  speculation  on 
the  grounds  that  the  prior  density  ir  may  be  heaped  around  0  =  0Q,  in  which 
case  the  term  log  n(0Q)  cannot  be  neglected.  However,  in  section  4  we  will 
employ  an  analogy  with  Dickey's  "sharp  null  hypothesis  testing"  to  show  that 
a  spike  at  0  =  9^  would  in  fact  have  negligible  effect  upon  the  accuracy 
of  our  approximate  criterion. 
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3.  Likelihood  Ratio  Tests 


We  next  modify  the  results  of  the  previous  section  to  cover  any  likelihood 
ratio  procedure  for  testing  HQ  :  0  =  0Q  against  :  0  f  0Q.  Expanding 

the  log-likelihood  of  0  in  a  Taylor  Series  about  0=0,  truncating  after 
the  quadratic  term,  and  taking  exponentials ,  yields  the  asymptotic  approximation 


1(0  x)  -  U0  x) 


exp  {-  nv  1  (0  -  0)2} 
0=0  2 


(n  -  <*>)  (3.1) 


where  v  is  defined  in  (2.1). 

TTie  left  hand  side  of  (2.2)  represents  our  asymptotic  approximation  to  the 
posterior  density  it(0;x)  of  0.  Hence  (3.1)  gives,  after  some  elementary 
manipulation 

X  (0  |  x )  -  ( -  V  u(0|x)  (3.2) 


i 


where 


X  (0  |x)  =  l (0 |x)/l (0 |x) 


(3.3) 


|  0=0 

represents  the  likelihood  ratio.  Note  that  (3.2)  provides,  as  a  subsidiary 
result,  a  simple  demonstration  of  the  asymptotic  behaviour  of  the  likelihood 
ratio  in  terms  of  the  posterior  density.  It  follows  immediately  that  the  condi¬ 
tion  in  (1.1)  is  asymptotically  equivalent  to 

-2  log  X(0  |x)  <  log  n  -  log  (2nv)  (n  -*•  °°)  (3.4) 


This  result  may  be  compared  with  the  standard  likelihood  ratio  test 
which,  under  H^,  takes  -2  log  X  to  possess  a  distribution  which  is 
asymptotically  chi-squared  with  a  single  degree  of  freedom.  The  log  n 
contribution  is  similar  in  spirit  to  a  result  described  by  Schwarz  (1978)  in 
the  context  of  estimating  the  dimension  of  a  model.  Our  formulation  is  however 
much  broader,  and  the  proofs  simpler.  Schwarz's  work  is  also  related  to  the 
approach  described  by  Lindley  (1961). 


In  the  special  case  where  x^ ,  .  .  .  ,  x^  constitute  a  random  sample 
from  a  distribution  with  parameter  9,  it  is  well-known  that  as  n  ®  the 
likelihood  ratio  test  will  reject  HQ  with  sanpling  probability  one.  For  any 
fixed  large  n,  the  criterion  in  (3.4)  becomes  less  inclined  to  recommend 
against  HQ  then  under  a  fixed-sized  test.  However,  in  the  extreme  limit  as 
n  -*  00  it  will  retain  a  property  which  is  similar  in  spirit  to  that  of  the 
likelihood  ratio  test,  i.e.  as  n  -*  00  a  random  sample  will  support  HQ  with 
sampling  probability  zero  (this  follows  essentially  because  n  1  log  n 
approaches  zero  in  the  limit  and  v  approaches  a  constant  value  in  this 
special  case).  Our  criterion  therefore  enables  us  to  replace  the  classical 
significance  level  by  values  which  depend  upon  sample  size  in  a  conservative 
enough  manner  to  preserve  a  sensible  property  as  n  -*•  °°.  This  fits  in  with 
our  general  philosophy  that  all  models  are  ultimately  wrong  i.e.  given  an 
arbitrarily  large  amount  of  available  data,  in  the  form  of  a  random  sample, 
any  particular  model  will  ultimately  become  inadequate. 


4.  Sharp  Null  Hypothesis  "nesting 

Suppose  now  that  the  prior  density  it  possesses  a  high  concentration 
around  the  null  hypothesis  9  *  0q.  Such  densities  may  be  approximated  by 
supposing  that  the  statistician  possesses  a  positive  prior  probability  $  that 
Hq  :  9  *  0Q  is  true,  and,  given  that  6  dQ,  that  he  possesses  conditional 

prior  density  q(0)  for  9.  Prior  distributions  of  this  special  nature, 
assigning  a  positive  prior  probability  to  a  "sharp  null  hypothesis"  have  been 
discussed  by  a  number  of  authors,  notably  Dickey  (1968,  1975),  Lindley  (1957), 
and  Schwarz  (1978).  The  definition  in  (1.1)  may  be  adjusted  to  this  type  of 
formulation  by  saying  that  the  data  support  HQ  with  respect  to  the  prior  if 
the  posterior  probability,  that  HQ  is  true,  is  greater  than  the  prior 
probability  $.  This  posterior  probability  is  given  by 

prob  (Hq  | x)  =  (4.D 

* 

where 

B  =  t(9  |x)  /  /  q(0H(0|x)d0  (4.2) 

U  '  0 

is  referred  to  as  the  "Bayes  factor." 

Note  from  (4.1)  and  (4.2)  that  the  data  support  Hq  if  and  only  if  B  >  1, 
and  that  this  condition  is  equivalent  to 

q (0Q | X)  >  q  (0q)  (4. 3) 

where 

q(eQ|x)  -  q<V4<60l?>  /  /  q(0)ue|x)d9  (4.4) 

0 

denotes  the  limit  as  0  -*■  0Q  of  the  conditional  posterior  density  of  9,  given 
that  0  f  0q.  Owing  to  the  similarity  between  (4.3)  and  (1.1)  the  results  of 
sections  2  and  3  may  be  applied  directly  to  this  conditional  situation.  They 
tell  us  that  as  n  -*■  ®  the  condition  in  (4.3)  is  asymptotically  equivalent  to 


either  (2.3)  or  (3.4).  Therefore,  whatever  the  value  of  p,  we  see  that  as 
n  -*•  co  the  data  will  support  H(J  if  (2.3),  or  (3.4),  is  satisfied.  The 
adequacy  of  these  approximations  depends  upon  q  but  not  upon  j>.  In  other 
words,  if  our  prior  density  ti  possesses  a  high  concentration  at  0  =  0Q ,  then 
this  will  not  affect  the  adequacy  of  our  asymptotic  approximations  involving 
log  n.  This  property  noticeably  increases  the  viability  of  our  approximations. 

Note  that  many  of  the  previous  results  in  the  literature  of  Bayes  factors 
may  be  approximated  by  observing  whether  9  lies  within  the  adjustment  in 
(2.3)  to  /  log  n  approximate  standard  deviations  of  Hg.  This  highlights 
Lindley's  paradox  in  a  very  general  way  -  note  that  Lindley  (1957)  describes 
particular  examples  where  overwhelming  rejection  of  HQ  via  significance  test¬ 
ing  is  complemented  by  high  Bayes  factors  in  favour  of  H^. 

The  theory  of  sharp  null  hypothesis  testing  can  also  be  used  for  answering 
question  (B)  as  described  in  section  1.  Following  Dickey  (1968)  suppose  that, 
when  Hq  is  true,  the  statistician  incurs  extra  loss  MQ  by  rejecting  rather 
than  accepting  HQ,  and  that,  when  is  true,  he  incurs  a  loss  M,  by 

accepting  rather  than  rejecting  HQ.  Then  the  Bayes  decision  would  tell  him 
to  accept  Hq  whenever  the  Bayes  factor  B  satisfies 

B  >  c  (4.5) 

where 

c  =  (1  -  4> ) M  /«I>M  (4.5) 

and  to  reject  HQ  otherwise.  The  condition  in  (4.5)  is  equivalent  to 
q(0Q|x)  >  cq  (®q)  where  q(0g|x)  is  defined  in  (4.4).  The  asymptotic  argu¬ 
ments  of  sections  2  and  3  are  again  appropriate  since  the  constant  c  is 
readily  absorbed  as  n  -*■  «.  As  a  refinement,  we  conclude  that  we  should  accept 
Hg  if  and  only  if 
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(n  -*■  “>) 


(4.7) 


' 


0-9 


'I'i'  '  <  (log  n  -  2  log  c  -  log  (2ttv)}' 


n  v 

We  therefore  recommend  that,  when  the  statistician  is  deciding  whether  to 
take  the  step  of  rejecting  H0,  he  should  refer  to  a  criterion  which  depends 
upon  a  constant  c.  This  constant  is  different  in  spirit  from  a  significance 
level;  its  specification  should  be  based  upon  two  costs  and  a  prior  probability. 
More  importantly,  once  c  is  specified  we  see  that  the  right  hand  side  of  (4.2) 
depends  upon  n  and  therefore  differs  from  standard  significance  testing 
procedures.  The  latter  do  not  therefore  agree  in  asymptotic  terms  with  this 
sensibly  formulated  procedure  for  answering  question  (B) . 
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5.  Multvariate  Procedures 


Assume  now  that  a  vector  0  =  (0  ,  .  .  .  ,  0  )  of  q  parameters  pos- 

-q  1.  q 

sess  likelihood  function  i(0^]x),  given  x  =  (x^,  .  .  .  ,  xn>*  Suppose  that 
for  Jl  <_  q  the  classical  significance  tester  wishes  to  test  the  null  hypothesis 
Hq  :  0e  =  against  the  alternative  hypothesis  :  0e  ^  £e  where  0g 

denotes  the  subvector  of  the  first  e  elements  of  q,  and  none  of  the  nuisance 


parameters  0 


.  ,  0  are  specified  or  restricted  under  either  H  or 
q  0 


We  extend  the  definition  in  (1.1)  by  taking  the  data  to  support  HQ  with 

respect  to  a  prior  density  tt(0^)  for  if  the  posterior  density  ir(0e|x) 

of  0^  is  greater  than  the  prior  density  n(0g)  when  evaluated  at  0g  =  £  . 

As  n  -*■  °°,  the  posterior  density  -n(0  lx)  of  0  is  asymptotically 

~q  ~  q 

approximated  by  a  multivariate  normal  density  with  mean  vector  equal  to  the 
maximum  likelihood  vector  0  and  covariance  matrix  equal  to  the  likelihood 

-q 

dispersion  matrix  n  which  possesses  inverse 


.  -  9  log  f(0  |x) 

nV"  =  - 

q  3(0  0  ) 

~q~q 


0  =0 
~q  -q 


Integrating  out  the  nuisance  parameters  9e+1»  .  .  .  #  0  we  find  that 

the  posterior  density  ir(0e|x)  of  0g  is  asymptotically  multivariate  normal 

«  -1  „ 
with  mean  vector  0  and  covariance  matrix  n  V  where  0  is  the  maximum 
~e  ~e  -e 

likelihood  vector  of  0  ,  and  n  is  the  first  e  x  e  submatrix  on  the 

-e  ~e 

diagonal  of  n  .  Hence,  by  analogy  with  the  method  of  section  2,  we  find 
that  as  n  -+  °°  the  data  will  support  HQ  whenever 


n(£e  -  0e)T  Vg1  (£  -  eg)  <  l  log  n  -  i  log  (2ti  | | ) 


(5.2) 


The  statistic  on  the  left  hand  side  of  (5.2)  is  seldom  employed  by  classical 


testers,  unless  e  =  q  or  is  diagonal,  since  the  matrix  does  not 


otherwise  occur  under  a  likelihood  ratio  approach.  The  likelihood  of  0^  may 
be  asymptotically  approximated  by 


M0  |x)  =  8.  (0  I  x) 


exp  (-  ^  n(0  -  0  )T  V  (0  -0)} 

e  =0  2  'q  ~q  'q  ~q  'a 

~q  -q 


(5.3) 


(n  -*•  °°) 


Replacing  0  in  (5.3)  by  £  and  the  remaining  parameters  by  their 
~e  -  e 

maximum  likelihood  estimates,  then  dividing  through  by  the  first  term  on  the 
right  hand  side,  we  find  that  the  likelihood  ratio  for  the  test  defined  at  the 
beginning  of  this  section  possesses  the  asymptotic  behaviour 


L(C  ]  x)  -  exp  {-  i- n  (£  -  0  )T  W  1  (£  -  0  )  (n  -*•  “) 

-e  -  Z  ' e  -6  ' g  ~ 6  - e 


(5.4) 


where  nW  represents  the  first  e  x  e  submatrix  on  the  diagonal  of  the 
information  matrix  nV^1.  By  similar  arguments  to  those  described  in  section  2 
it  is  straightforward  to  relate  the  asymptotic  behaviour  of  the  likelihood 


ratio  to  the  posterior  density  of  0g  by 


(2tt)*6  |v  |  ^ 

L(£„  |x) - - -  exP  {“  ~  ir  (§e  |x)  (n  -*■  °°) 


-e  1  - 


(5.5) 


where 


(£  ~  0  ) (W_1  -  V_1)  (C  -  0  ) 

~e  >e  ~e  -e  3e  ~e 


(5.6) 


Note  that  the  expression  for  in  (5.6)  will  always  be  non-negative 

since  the  matrix  W  ^  -  V  ^  is  positive  semi -definite.  This  follows  from  the 


-e 


~e 


representation 


W 


.-1 


-1  T 
HG  H 


based  upon  the  partition 


V"1 

-q 


w"1 1  h 
e  !  ' 

- -| - 


of  the  matrix  satisfying  (5.1). 


It  follows  immediately  from  (5.5)  that,  as  n  -*•  ",  the  data  will  support 
Hq  whenever 

-  2  log  L(§e|x)  <  i.  log  n  -  i  log  (  2tt  |  |  )  +  nAg  (5.7) 

Under  HQ,  the  quantity  on  the  left  hand  side  of  (5.1)  possesses  a 

sampling  distribution  which  is  asymptotically  chi-squared  with  e  degrees  of 

freedom.  We  however  see  from  the  right  hand  side  of  (5.7)  that  there  is  an 

increased  problem  in  working  with  the  likelihood  ratio  rather  than  the 

quadratic  term  in  (5.2).  This  is  because  of  the  addition  of  the  extra  term 

nA  whenever  nuisance  parameters  are  present.  In  many  examples  A  will 
6  6 

tend  to  a  constant  positive  value  as  n  Therefore  for  large  n  the  con¬ 

tribution  nA  could  dominate  even  the  term  8,  log  n.  We  conclude  that 
e 

fixed-size  likelihood  ratio  tests  with  nuisance  parameters  present  may  be  even 
more  overready  to  reject  the  null  hypothesis  than  the  tests  discussed  in 
section  2  for  single  parameter  situations. 
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6.  An  alternative  to  the  chi-squared  goodness  of  fit  test 

Suppose  that  all  frequencies  x^ ,  .  .  .  ,  x^_  possess  a  multinomial 
distribution  with  corresponding  all  probabilities  0^,  .  .  .  ,  0^,  summing 
to  unity,  and  sample  size  n,  and  suppose  also  that  we  wish  to  compare 

:  8,  8  =  C  with  an  unrestricted  alternative  hypothesis. 

0  1  1  s  s 

Then  a  straightforward  application  of  the  result  in  (5.2)  for  the  s  -  1 
distinct  parameters  0^,  .  .  .  ,  0g  ^  tells  us  that  as  n  -*■  “  and  the 
Pj  =  x,/n  remain  fixed  and  positive  the  data  will  support  whenever 

s  _  1  2  ii 

nip.  (P .  -  £.)  <  (s  -  1)  log  n  +  (s  -  1)  log  (2ir  V  P .  P,  )  (6.1) 

j=l  5  5  3  j.ktjA  3  k 

Note  firstly  that  the  standard  chi-squared  statistic  should  be  replaced 
by  its  modification  on  the  left  hand  side  of  (6.1)  and  that  for  large  enough 
n  the  data  will  support  HQ  whenever  the  modified  statistic  is  less  than 
log  n  times  the  usual  degrees  of  freedom.  It  might  be  interesting  to  compare 
this  with  a  limiting  result  by  Leonard  (1977)  which  suggests  that  in  prelimi¬ 
nary  testing  situations  the  critical  value  should  be  twice  the  degrees  of 
freedom,  though  the  purpose  of  the  analysis  is  then  very  different.  Other 
aspects  of  significance  testing  are  discussed  by  Leonard  and  Ord  (1976)  and 


Leonard  (1978) . 


General  Conclusions 


The  phenomena  discussed  here  would  seem  to  be  particularly  relevant  to 
the  area  of  scientific  reporting.  For  example,  many  results  in  applied 
journals  which  might  have  been  viewed  as  "significant,"  because  they  yield  a 
low  p-value,  may  indeed  yield  evidence  in  support  of  the  null  hypothesis 
and  in  fact  serve  to  detract  from  the  very  scientific  theory  which  they  claim 
to  substantiate.  Efforts  should  perhaps  be  made  to  reduce  the  role  of 
significance  tests  in  application  areas  of  statistics  (e.g.  psychology, 
medicine,  and  sociology) .  It  seems  evident  that  low  p-values  should  no 
longer  be  viewed  as  a  prerequisite  for  the  acceptance  of  a  scientific  theory. 

All  too  often,  significance  tests  are  employed  by  applied  workers 
either  in  an  attempt  to  formally  confirm  results  which  one  already  thought  to 
be  true  intuitively  speaking,  or  to  assist  in  a  search  for  a  model  which 
provides  a  good  fit  to  the  data,  or  to  help  extract  a  few  "significant" 
conclusions  from  a  large  and  maybe  noisy  data  set.  We  feel  that  much  more 
ertphasis  should  be  placed  in  applied  circles  on  developing  a  model  which  is 
meaningful  in  the  real-life  context  of  the  practical  problem  at  hand.  The 
techniques  involving  /  log  n  described  in  this  paper  may  be  useful  in  check¬ 
ing  that  the  data  gives  credibility  to  the  model.  When  n  is  large,  they 
permit  a  larger  range  of  viable  null  hypotheses  than  experienced  under  fixed 
size  significance  testing.  It  should  therefore  be  easier  to  find  a  data- 
credible  model  which  is  also  reasonable  in  real-life  terms. 
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